如今,渴望数据的深神经网络(DNNS)的创建者搜索互联网训练饲料,使用户几乎无法控制或了解何时将其数据用于模型培训。为了使用户能够抵消不需要的数据使用,我们设计,实施和评估一个实用系统,该系统使用户能够检测其数据是否用于培训DNN模型。我们展示了用户如何创建我们称为同位素的特殊数据点,该数据点在培训期间将“伪造功能”引入DNN中。仅查询访问训练的模型,并且对模型培训过程不了解或对数据标签的控制,用户可以应用统计假设测试来检测模型是否通过对用户的培训进行培训来了解与其同位素相关的虚假特征数据。这有效地将DNNS对记忆和虚假相关性的脆弱性变成了数据出处的工具。我们的结果证实了在多种设置中的功效,检测并区分了数百种具有高精度的同位素。我们进一步表明,我们的系统在公共ML-AS-AS-Service平台和较大的模型(例如ImageNet)上工作,可以使用物理对象代替数字标记,并且通常对几种自适应对策保持坚固。
translated by 谷歌翻译
我们介绍Artbench-10,这是一流的平衡,高质量的,清洁的注释和标准化数据集,用于基准艺术品生成。它包括60,000幅艺术品图像,来自10种独特的艺术风格,每种样式的训练图像和1,000张测试图像。 Artbench-10比以前的艺术品数据集具有多个优势。首先,它是平衡的,而大多数以前的艺术品数据集都遭受了长时间的分布。其次,这些图像具有高质量,并带有干净的注释。第三,ArtBench-10是由标准化数据收集,注释,过滤和预处理程序创建的。我们提供三个版本的数据集,具有不同的分辨率($ 32 \ times32 $,$ 256 \ times256 $和原始图像尺寸),并以一种易于通过流行的机器学习框架来合并的方式。我们还使用具有ArtBench-10的代表性图像合成模型进行了广泛的基准测试实验,并进行了深入分析。该数据集可从https://github.com/liaopeiyuan/artbench获得公平使用许可证。
translated by 谷歌翻译
在传统的对象检测框架中,从图像识别模型继承的骨干体提取了深层特征,然后颈部模块融合了这些潜在特征,以在不同的尺度上捕获信息。由于对象检测的分辨率比图像识别大得多,因此骨干的计算成本通常主导了总推断成本。这种沉重的背部设计范式主要是由于历史遗产将图像识别模型传输到对象检测时,而不是端到端的优化设计以进行对象检测。在这项工作中,我们表明这种范式确实导致了亚最佳对象检测模型。为此,我们提出了一种新型的重颈范式,长颈鹿,这是一个类似长颈鹿的网络,用于有效的对象检测。长颈鹿使用极轻的骨干和非常深的颈部模块,可同时同时在不同的空间尺度以及不同级别的潜在语义之间进行密集的信息交换。该设计范式允许检测器即使在网络的早期阶段,也可以在相同的优先级处理高级语义信息和低级空间信息,从而使其在检测任务中更有效。对多个流行对象检测基准的数值评估表明,长颈鹿在广泛的资源约束中始终优于先前的SOTA模型。源代码可在https://github.com/jyqi/giraffedet上获得。
translated by 谷歌翻译
Graph neural networks (GNNs) have been increasingly deployed in various applications that involve learning on non-Euclidean data. However, recent studies show that GNNs are vulnerable to graph adversarial attacks. Although there are several defense methods to improve GNN robustness by eliminating adversarial components, they may also impair the underlying clean graph structure that contributes to GNN training. In addition, few of those defense models can scale to large graphs due to their high computational complexity and memory usage. In this paper, we propose GARNET, a scalable spectral method to boost the adversarial robustness of GNN models. GARNET first leverages weighted spectral embedding to construct a base graph, which is not only resistant to adversarial attacks but also contains critical (clean) graph structure for GNN training. Next, GARNET further refines the base graph by pruning additional uncritical edges based on probabilistic graphical model. GARNET has been evaluated on various datasets, including a large graph with millions of nodes. Our extensive experiment results show that GARNET achieves adversarial accuracy improvement and runtime speedup over state-of-the-art GNN (defense) models by up to 13.27% and 14.7x, respectively.
translated by 谷歌翻译
在近期深度图像压缩神经网络中,熵模型在估计深度图像编码的先前分配时起着重要作用。现有方法将HydupRior与熵估计功能中的本地上下文组合。由于没有全球愿景,这大大限制了他们的表现。在这项工作中,我们提出了一种新的全局参考模型,用于图像压缩,以有效地利用本地和全局上下文信息,导致增强的压缩率。所提出的方法扫描解码的潜伏,然后找到最相关的潜伏,以帮助分布估计当前潜伏。这项工作的副产品是一种平均转换GDN模块的创新,进一步提高了性能。实验结果表明,所提出的模型优于行业中大多数最先进方法的速率变形性能。
translated by 谷歌翻译
In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet-like / CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of "large neck, small head". We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results. In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios, i.e., DAMO-YOLO-Tiny/Small/Medium. They can achieve 43.0/46.8/50.0 mAPs on COCO with the latency of 2.78/3.83/5.62 ms on T4 GPUs respectively. The code is available at https://github.com/tinyvision/damo-yolo.
translated by 谷歌翻译
Graph structure learning aims to learn connectivity in a graph from data. It is particularly important for many computer vision related tasks since no explicit graph structure is available for images for most cases. A natural way to construct a graph among images is to treat each image as a node and assign pairwise image similarities as weights to corresponding edges. It is well known that pairwise similarities between images are sensitive to the noise in feature representations, leading to unreliable graph structures. We address this problem from the viewpoint of statistical tests. By viewing the feature vector of each node as an independent sample, the decision of whether creating an edge between two nodes based on their similarity in feature representation can be thought as a ${\it single}$ statistical test. To improve the robustness in the decision of creating an edge, multiple samples are drawn and integrated by ${\it multiple}$ statistical tests to generate a more reliable similarity measure, consequentially more reliable graph structure. The corresponding elegant matrix form named $\mathcal{B}\textbf{-Attention}$ is designed for efficiency. The effectiveness of multiple tests for graph structure learning is verified both theoretically and empirically on multiple clustering and ReID benchmark datasets. Source codes are available at https://github.com/Thomas-wyh/B-Attention.
translated by 谷歌翻译
如今,在人员重新识别(Reid)任务的真实数据面临隐私问题,例如,禁止DataSet Dukemtmc-Reid。因此,收集Reid任务的真实数据变得更难。同时,标签的劳动力成本仍然很高,进一步阻碍了Reid研究的发展。因此,许多方法转向为REID算法生成合成图像作为替代方而不是真实图像。然而,合成和真实图像之间存在不可避免的领域差距。在以前的方法中,生成过程基于虚拟场景,并且无法根据不同的目标实际场景自动更改其合成训练数据。为了处理这个问题,我们提出了一种新颖的目标感知一代管道,以产生称为Tagerson的合成人物图像。具体地,它涉及参数化渲染方法,其中参数是可控的,并且可以根据目标场景调整。在Tagperson中,我们从目标场景中提取信息,并使用它们来控制我们的参数化渲染过程以生成目标感知的合成图像,这将使目标域中的实图像保持较小的间隙。在我们的实验中,我们的目标感知的合成图像可以实现比MSMT17上的广义合成图像更高的性能,即秩1精度的47.5%与40.9%。我们将发布此工具包\脚注{\ noindent代码可用于\ href {https://github.com/tagperson/tagperson-blender} {https://github.com/tagperson/tagperson -brender}}为Reid社区以任何所需味道产生合成图像。
translated by 谷歌翻译
在对象检测模型中,检测骨干机消耗超过一半的整体推理成本。最近的研究试图通过在神经结构搜索(NAS)的帮助下优化骨干架构来降低这一成本。然而,对象检测的现有NAS方法需要数百至数千个GPU小时的搜索,使它们在快节奏的研究和开发中不切实际。在这项工作中,我们提出了一种新的零射NAS方法来解决这个问题。所提出的方法,命名为Zendet,在不训练网络参数的情况下自动设计有效的检测骨干网,从而降低了架构设计成本,几乎归零但提供了最先进的(SOTA)性能。在引擎盖下,Zendet最大化了检测骨干的差分熵,导致对象检测的更好的特征提取器,在相同的计算预算下。在仅为全自动设计的一个GPU日之后,Zendet在多个检测基准数据集上创新了SOTA检测骨干,具有很少的人为干预。与Reset-50个骨干相比,Zendet在Map中使用相同数量的拖波/参数时更好地+ 2.0%,并且在同一地图上的NVIDIA V100速度快1.54倍。稍后将发布代码和预先训练的型号。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译